AITopics | high-dimensional scaling

A Dirty Model for Multi-task Learning

Neural Information Processing SystemsSep-30-2025, 21:08:55 GMT

We consider the multiple linear regression problem, in a setting where some of the set of relevant features could be shared across the tasks. A lot of recent research has studied the use of $\ell_1/\ell_q$ norm block-regularizations with $q > 1$ for such (possibly) block-structured problems, establishing strong guarantees on recovery even under high-dimensional scaling where the number of features scale with the number of observations. However, these papers also caution that the performance of such block-regularized methods are very dependent on the {\em extent} to which the features are shared across tasks. Indeed they show~\citep{NWJoint} that if the extent of overlap is less than a threshold, or even if parameter {\em values} in the shared features are highly uneven, then block $\ell_1/\ell_q$ regularization could actually perform {\em worse} than simple separate elementwise $\ell_1$ regularization. We are far away from a realistic multi-task setting: not only do the set of relevant features have to be exactly the same across tasks, but their values have to as well.

dirty model, multi-task learning, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

A Dirty Model for Multi-task Learning

Ali Jalali, Sujay Sanghavi, Chao Ruan, Pradeep K. Ravikumar

Neural Information Processing SystemsFeb-11-2025, 17:55:02 GMT

We consider multi-task learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks.

artificial intelligence, dirty model, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

A Dirty Model for Multi-task Learning Ali Jalali

Neural Information Processing SystemsMar-15-2024, 15:23:09 GMT

We consider multi-task learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks.

dirty model, matrix, regularizer, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

A Dirty Model for Multi-task Learning

Jalali, Ali, Sanghavi, Sujay, Ruan, Chao, Ravikumar, Pradeep K.

Neural Information Processing SystemsFeb-15-2020, 01:29:26 GMT

We consider the multiple linear regression problem, in a setting where some of the set of relevant features could be shared across the tasks. A lot of recent research has studied the use of $\ell_1/\ell_q$ norm block-regularizations with $q 1$ for such (possibly) block-structured problems, establishing strong guarantees on recovery even under high-dimensional scaling where the number of features scale with the number of observations. However, these papers also caution that the performance of such block-regularized methods are very dependent on the {\em extent} to which the features are shared across tasks. Indeed they show \citep{NWJoint} that if the extent of overlap is less than a threshold, or even if parameter {\em values} in the shared features are highly uneven, then block $\ell_1/\ell_q$ regularization could actually perform {\em worse} than simple separate elementwise $\ell_1$ regularization. We are far away from a realistic multi-task setting: not only do the set of relevant features have to be exactly the same across tasks, but their values have to as well.

dirty model, high-dimensional scaling, multi-task learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Dirty Model for Multi-task Learning

Jalali, Ali, Sanghavi, Sujay, Ruan, Chao, Ravikumar, Pradeep K.

Neural Information Processing SystemsDec-31-2010

We consider the multiple linear regression problem, in a setting where some of the set of relevant features could be shared across the tasks. A lot of recent research has studied the use of $\ell_1/\ell_q$ norm block-regularizations with $q > 1$ for such (possibly) block-structured problems, establishing strong guarantees on recovery even under high-dimensional scaling where the number of features scale with the number of observations. However, these papers also caution that the performance of such block-regularized methods are very dependent on the {\em extent} to which the features are shared across tasks. Indeed they show~\citep{NWJoint} that if the extent of overlap is less than a threshold, or even if parameter {\em values} in the shared features are highly uneven, then block $\ell_1/\ell_q$ regularization could actually perform {\em worse} than simple separate elementwise $\ell_1$ regularization. We are far away from a realistic multi-task setting: not only do the set of relevant features have to be exactly the same across tasks, but their values have to as well. Here, we ask the question: can we leverage support and parameter overlap when it exists, but not pay a penalty when it does not? Indeed, this falls under a more general question of whether we can model such \emph{dirty data} which may not fall into a single neat structural bracket (all block-sparse, or all low-rank and so on). Here, we take a first step, focusing on developing a dirty model for the multiple regression problem. Our method uses a very simple idea: we decompose the parameters into two components and {\em regularize these differently.} We show both theoretically and empirically, our method strictly and noticeably outperforms both $\ell_1$ and $\ell_1/\ell_q$ methods, over the entire range of possible overlaps. We also provide theoretical guarantees that the method performs well under high-dimensional scaling.

artificial intelligence, dirty model, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Phase transitions for high-dimensional joint support recovery

Negahban, Sahand, Wainwright, Martin J.

Neural Information Processing SystemsDec-31-2009

We consider the following instance of transfer learning: given a pair of regression problems, suppose that the regression coefficients share a partially common support, parameterized by the overlap fraction $\overlap$ between the two supports. This set-up suggests the use of $1, \infty$-regularized linear regression for recovering the support sets of both regression vectors. Our main contribution is to provide a sharp characterization of the sample complexity of this $1,\infty$ relaxation, exactly pinning down the minimal sample size $n$ required for joint support recovery as a function of the model dimension $\pdim$, support size $\spindex$ and overlap $\overlap \in [0,1]$. For measurement matrices drawn from standard Gaussian ensembles, we prove that the joint $1,\infty$-regularized method undergoes a phase transition characterized by order parameter $\orpar(\numobs, \pdim, \spindex, \overlap) = \numobs{(4 - 3 \overlap) s \log(p-(2-\overlap)s)}$. More precisely, the probability of successfully recovering both supports converges to $1$ for scalings such that $\orpar > 1$, and converges to $0$ to scalings for which $\orpar < 1$. An implication of this threshold is that use of $1, \infty$-regularization leads to gains in sample complexity if the overlap parameter is large enough ($\overlap > 2/3$), but performs worse than a naive approach if $\overlap < 2/3$. We illustrate the close agreement between these theoretical predictions, and the actual behavior in simulations. Thus, our results illustrate both the benefits and dangers associated with block-$1,\infty$ regularization in high-dimensional inference.

artificial intelligence, machine learning, regression problem, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback